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Abstract 

We revisit the information-theoretic analysis of bit-interleaved coded modulation (BICM) by mod- 
eling the BICM decoder as a mismatched decoder. The mismatched decoding model is well-defined 
for finite, yet arbitrary, block lengths, and naturally captures the channel memory among the bits 
belonging to the same symbol. We give two independent proofs of the achievability of the BICM 
capacity calculated by Caire et al. where BICM was modeled as a set of independent parallel binary- 
input channels whose output is the bitwise log-likelihood ratio. Our first achievability proof uses typical 
sequences, and shows that due to the random coding construction, the interleaver is not required. The 
second proof is based on the random coding error exponents with mismatched decoding, where the 
largest achievable rate is the generalized mutual information. We show that the generalized mutual 
information of the mismatched decoder coincides with the infinite-interleaver BICM capacity. We also 
show that the error exponent -and hence the cutoff rate- of the BICM mismatched decoder is upper 
bounded by that of coded modulation and may thus be lower than in the infinite-interleaved model. 
For binary reflected Gray mapping in Gaussian channels the loss in error exponent is small. We also 
consider the mutual information appearing in the analysis of iterative decoding of BICM with EXIT 
charts. We show that the corresponding symbol metric has knowledge of the transmitted symbol and the 
EXIT mutual information admits a representation as a pseudo-generalized mutual information, which is 
in general not achievable. A different symbol decoding metric, for which the extrinsic side information 
refers to the hypothesized symbol, induces a generalized mutual information lower than the coded 
modulation capacity. We also show how perfect extrinsic side information turns the error exponent of 
this mismatched decoder into that of coded modulation. 
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I. Introduction 

In the classical bit-interleaved coded modulation (BICM) scheme proposed by Zehavi in [1], 
the channel observation is used to generate decoding metrics for each of the bits of a symbol, 
rather than the symbol metrics used in Ungerbock's coded modulation (CM) [2]. This decoder 
is sub-optimal and non-iterative, but offers very good performance and is interesting from a 
practical perspective due to its low implementation complexity. In parallel, iterative decoders 
have also received much attention [3], [4], [5], [6], [7] thanks to their improved performance. 

Caire et al. [8] further elaborated on Zehavi's decoder and, under the assumption of an infinite- 
length interleaver, presented and analyzed a BICM channel model as a set of parallel independent 
binary-input output symmetric channels. Based on the data processing theorem [9], Caire et al. 
showed that the BICM mutual information cannot be larger than that of CM. However, and 
rather surprisingly a priori, they found that the cutoff rate of BICM might exceed that of CM 
[10]. The error exponents for the parallel-channel model were studied by Wachsmann et al. [11]. 

In this paper we take a closer look to the classical BICM decoder proposed by Zehavi and cast 
it as a mismatched decoder [12], [13], [14]. The observation that the classical BICM decoder 
treats the different bits in a given symbol as independent, even if they are clearly not, naturally 
leads to a simple model of the symbol mismatched decoding metric as the product of bit decoding 
metrics, which are in turn related to the log-likelihood ratios. We also examine the BICM mutual 
information in the analysis of iterative decoding by means of EXIT charts [5], [6], [7], where 
the sum of the mutual informations across the parallel subchannels is used as a figure of merit 
of the progress in the iterative decoding process. 

This paper is organized as follows. Section In] introduces the system model and notation. 



Section III gives a proof of achievability of the BICM capacity, derived in [8] for the independent 



parallel channel model, by using typical sequences. Section IV shows general results on the error 
exponents, including the generalized mutual information and cutoff rate as particular instances. 
The BICM error exponent (and in particular the cutoff rate) is always upper-bounded by that of 
CM, as opposed to the corresponding exponent for the independent parallel channel model [8], 



[11], which can sometimes be larger. In particular, Section IV-D studies the achievable rates of 
BICM under mismatched decoding and shows that the generalized mutual information [12], [13], 
[14] of the BICM mismatched decoder yields the BICM capacity. The section concludes with 



May 9, 2008 



DRAFT 



3 



some numerical results, including a comparison with the parallel-channel models. In general, 
the loss in error exponent is negligible for binary reflected Gray mapping in Gaussian channels. 
In Section [V] we turn our attention to the iterative decoding of BICM. First, we review how 
the mutual information appearing in the analysis of iterative decoding of BICM with EXIT 
charts, where the symbol decoding metric has some side knowledge of the transmitted symbol, 
admits a representation as a pseudo-generalized mutual information. A different symbol decoding 
metric, for which the extrinsic side information refers to the hypothesized symbol, induces a 
generalized mutual information lower in general than the coded modulation capacity. Moreover, 
perfect extrinsic side information turns the error exponent of this mismatched decoder into that 



of coded modulation. Finally, Section VI draws some concluding remarks. 



II. Channel Model and Code Ensembles 

A. Channel Model 

We consider the transmission of information by means of a block code Ai of length N. At 
the transmitter, a message m is mapped onto a codeword x = (x\, . . . , xn), according to one 



of the design options described later, in Section II-B We denote this encoding function by <fi. 
Each of the symbols x are drawn from a discrete modulation alphabet X = {x±, . . . , %}, with 
M — \X\ and m = log 2 M being the number of bits required to index a symbol. 

We denote the output alphabet by y and the channel output by y — (y 1 , . . . , y N ), with y k 6 y. 
With no loss of generality, we assume the output is continuous^] so that the channel output y 
related to the codeword x through a conditional probability density function p(y\x). Further, 
we consider memory less channels, for which 

N 

p{y\x) = Y[p(yk\x k ), (i) 

fc=i 

where p(y\x) is the channel symbol transition probability. Henceforth, we drop the words density 
function in our references of p(y\x). We denote by X,Y the underlying random variables. 
Similarly, the corresponding random vectors are X = (X, ...,X) and Y = (X, ...,X), 

N times N times 

respectively drawn from the sets X = X N , y = y N . 

'All our results are directly applicable to discrete output alphabets, by appropriately replacing integrals by sums. 
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A particularly interesting, yet simple, case is that of complex-plane signal sets in AWGN with 
fully-interleaved fading where y = C and 

y k = h k ^/snrx k + z k , k = l,...,N (2) 

where h k are fading coefficients with average unit energy, z k are the complex zero-mean unit- 
variance AWGN samples and snr is the signal-to-noise ratio (SNR). The decoder outputs an 
estimate of the message m according to a given codeword decoding metric, which we denote 
by q(x,y) as 

m = argmaxg(a? m , y). (3) 

m 

The codeword metrics we consider are the product of symbol decoding metrics q(x,y), for 
x E X and y E y, namely 

N 

q(^,y) = Y[q(x k ,y k ). (4) 

k=l 

Assuming that the codewords have equal probability, this decoder finds the most likely code- 
word as long as q{x,y) = f(p(y\x)), where /(.) is a one-to-one increasing function, i.e., as 
long as the decoding metric is a one-to-one increasing mapping of the transition probability of 
the memoryless channel. If the decoding metric q(x, y) is not an increasing one-to-one function 
of the channel transition probability, the decoder defined by ^ is a mismatched decoder [12], 
[13], [14]. 

B. Code Ensembles 

1) Coded Modulation: In a coded modulation (CM) scheme Ai, the encoder <\> selects a 
codeword of N modulation symbols, x m = (x±, . . . ,xn) according to the information message 
m. The code is in general non-binary, as symbols are chosen according to a probability law 
p(x). Representing the information message set {1, . . . , |.M|}, we have that the rate R of this 
scheme in bits per channel use is given by R = ^, where K = log 2 \JA\ denotes the number of 
bits needed to represent every information message. At the receiver, a maximum metric decoder 
4> (as in Eq. ([3])) acts on the received sequence y to generate an estimate of the transmitted 
message, ip(y) = m. In coded modulation constructions, such as Ungerboeck's [2], the symbol 
decoding metric is proportional to the channel transition probability, that is q(x, y) oc p(y\x); the 
value of proportionality constant is irrelevant, as long as it is not zero. Reliable communication 
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is possible at rates lower than coded modulation capacity or CM capacity, denoted by C^ n and 
given by 

Q m = E 

The expectation is done according to p(x)p(y\x). We consider often a uniform input distribution 

p(x) = 2- rn . 

2) Bit-Interleaved Coded Modulation: In a bit-interleaved coded modulation scheme M., the 
encoder is restricted to be the serial concatenation of a binary code C of length n = mN 
and rate r = ^, a bit interleaver, and a binary labeling function p : {0, l} m — > # which 
maps blocks of m bits to signal constellation symbols. The codewords of C are denoted by 
b = (&!,..., 6 m A?). The portions of codeword allocated to the j-th bit of the label are denoted 
by bj = (bj, b m+ j, . . . , b m ^N-i)+j)- We denote the inverse mapping function for labeling position 
j as bj : : X — > {0, 1}, that is, is the j-th bit of symbol x. Accordingly, we now define the 
sets Xl ' = {x G X : = 6} as the set of signal constellation points x whose binary label has 
value b G {0, 1} in its j-th position. With some abuse of notation, we will denote B 1: . . . , B rn 
and bi , . . . , b m the random variables and their corresponding realizations of the bits in a given 
label position j = 1, . . . , m. 

The classical BICM decoder [1] treats each of the m bits in a symbol as independent and uses 
a symbol decoding metric proportional to the product of the a posteriori marginals p(bj = b\y). 
More specifically, we have that 

m 

Q(x,y) = '[[q j (b j (x),y), (6) 

3=1 

where the j-th bit decoding metric qj(b,y) is given by 

qj(bj(x) =b,y)= P(v\ x ')- ( 7 ) 

x'exl 

This metric is proportional to the transition probability of the output y given the bit b at position 
j, which we denote for later use by pj(y\b), 

PMb) = ,i| E Pd/lsO- (8) 

The set of m probabilities Pj(y\b) can be used as departure point to define an equivalent 
BICM channel model. Accordingly, Caire et al. defined a BICM channel [8] as the set of m 



log 



P(Y\X) 



(5) 
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parallel channels having bit bj(xk) as input and the bit log-metric (log-likelihood) ratio for the 
k-th symbol 

qjjbjjpck) = l,y) 

^ m(fc _i) +i = log —7— — —r (9) 

qj^bjixk) = 0,y) 

as output, for j = 1, . . . , m and k = 1, . . . ,N. This channel model is schematically depicted 
in Figure [T] With infinite-length interleaving, the m parallel channels were assumed to be 
independent in [8], [11], or in other words, the correlations among the different subchannels 
are neglected. For this model, Caire et al. defined a BICM capacity C^ lcm , given by 



C"»^^J(S i ;y) = E E 

3=1 3=1 



log- 



(10) 



where the expectation is taken according to pj(y\b)p(b), for b G {0, 1} and p{b) = |. 

In practice, due to complexity limitations, one might be interested in the following lower- 
complexity version of |7]), 

qj(b,y) = maxp(y\x). (11) 

In the log-domain this is known as the max-log approximation. 

Summarizing, the decoder of C uses a mismatched metric of the form given in Eq. where 
the decoder of C outputs a binary codeword b according to 

N m 

b = argmaxTT Y\ qj(bj(x k ),y n ). (12) 

III. ACHIEVABILITY OF THE BICM CAPACITY: TYPICAL SEQUENCES 

In this section, we provide an achievability proof for the BICM capacity based on typical 
sequences. The proof is based on the usual random coding arguments [9] with typical sequences, 
with a slight modification to account for the mismatched decoding metric. This result is obtained 
without recurring to an infinite interleaver to remove the correlation among the parallel subchan- 
nels of the classical BICM model. We first introduce some notation needed for the proof. 

We say that a rate R is achievable if, for every e > and for N sufficiently large, there exists 
an encoder, a demapper and a decoder such that 4 log > R — e and Pr(m ^ m) < e. We 
define the joint probability of the channel output y and the corresponding input bits (bi, . . . , b m ) 
as 

p(bx,.. .,b m ,y) = Pt(B 1 = b u . . . , B m = b m ,y <Y < y + dy) , (13) 
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for all bj G {0, 1}, y and infinitely small dy. We denote the derived marginals by Pj(bj), for 
j — 1, . . . , m, and p(y). The marginal distributions with respect to bit Bj and Y are special, and 
are denoted by pj(bj,y). We have then the following theorem. 

Theorem 1: The BICM capacity C^ cm is achievable. 

Proof: Fix an e > 0. For each m e M we generate a binary codeword &i(m) . . . , b m (m) 
with probabilities Pj(bj). The codebook is the set of all codewords generated with this method. 

We consider a threshold decoder, which outputs m only if m is the unique integer satisfying 

(6i(m),...,6 ro (m),i/) e B e , (14) 

where B e is a set defined as 

Be U( 6l ,...^,y)4X>g J, & ( ^>A e } (15) 

for A e = Xlj=i -^(-^ji ^) — 3me. Otherwise, the decoder outputs an error flag. 

The usual random coding argument [9] shows that the error probability, averaged over the 
ensemble of randomly generated codes, P e , is upper bounded by 

P e <Pi + (|M|-l)P 2 , (16) 

where P x is the probability that the received y does not belong to the set B e , 

P l = P(b 1 ,...,b m ,y), (17) 

{bi,...,bm,y)$B e 

and P 2 is the probability that another randomly chosen codeword would be (wrongly) decoded, 
that is, 

m 

(b 1 ,...,b m ,y)eB e j=i 

First, we prove that B e D A e (B 1 , . . . , B m , Y), where A e is the corresponding jointly typical 
set [9]. By definition, the sequences . . . , 6 m , y) in the typical set satisfy (among other 



constraints) the following 

-\ogp(y)>N(H(Y)-e), (19) 

-logp j (b j )>N(H(B j )-e), j = l,...,m, (20) 

-logpfay) <N(H(Bj,Y) + e), j = l,...,m. (21) 
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Here H(-) are the entropies of the corresponding random variables. Multiplying the last equation 
by (—1), and summing them, we have 

Pj (b 3 {m)y) N ^ B)+H{ y)- H {B j ,Y)-Ze) (22) 

= N(l(B f ,Y)-3e), (23) 

where I(Bj] Y) is the corresponding mutual information. Now, summing over j = 1, . . . , m we 
obtain 

± log ^h^> N (± I{Bf ,Y)-3e) = iVA, (24) 

Pj(bMy) J 

Hence, all typical sequences belong to the set £> e , that is, A t C B t . This implies that B\ C .4,° 
and, therefore, the probability Pi in Eq. ( [17] ) can be upper bounded as 

Px< E p(*i.-"> 

(6i,...,6 m ,j/)^A 

< e, (25) 

for iV sufficiently large. The last inequality follows from the definition of the typical set. 
We now move on to P 2 . For (6 1; . . . , b m , y) G B € , and from the definition of B e , we have that 

Rearranging terms we have 

m j m 

i=i 3=1 
Therefore the probability P 2 in Eq. ( [T8] ) can be upper bounded 

p 2<2^r E p{y)\{pAh\v) (28) 

(&l,...,6 m ,3/)eS £ 3=1 
1 m 

<2^va7 E P(y)IIPi(bily) (29) 

(bi,...,b m ,y) 3=1 

^a7 E P( b i>---, b m,2/) (30) 

(31) 



(6i,...,b m ,y) 
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Now we can write for P e , 

Pe<Pl + (\M\-l)P 2 

<e+\M\2- NA * 

< 2e, (32) 

for \A4\ = 2 N ^~~^ and large enough N. We conclude that for large enough N there exist codes 
such that 

- log 2 \M\>A e -e = J2 HBj- Y) - (3m + l)e, (33) 

3=1 

and Pr(m ^ m) < e. The rate 5^j=i ^(-^j> Y) is thus achievable. 

To conclude, we verify that the BICM decoder is able to determine the probabilities required 



for the decoding rule defining B e in Eq. ( [T5] ). Since the BICM decoder uses the metric qj(bj,y) oc 
Pj(y\bj), the log-metric-ratio , or equivalently the a posteriori bit probabilities Pj(bj\y), it can 
also compute 

Pj( b 3,y) _Pj(bj\y) 



Pj(bj)p(y) Pj(bj) 

where the bit probabilities are known, pj(l) = Pj(0) = |. ■ 

iv. achievability of the bicm capacity: error exponents, generalized 
Mutual Information and Cut-off Rate 

A. Random coding exponent 

The behaviour of the average error probability of a family of randomly generates, decoded 
with a maximum-likelihood decoder, i. e. for a decoding metric satisfying q(x,y) = p(y\x), 
was studied by Gallager in [15]. In particular, Gallager showed the error probability decreases 
exponentially with the block length N according to a parameter called the error exponent, 
provided that the code rate R is below the channel capacity C. 

For memory less channels Gallager found [15] that the average error probability over the 
random coding ensemble can be bounded as 

Pe < exp(-iV(Eo(p) - pR)) (35) 
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where E Q (p) is the Gallager function, given by 



E o(p) = - lo g (^2p( x )p(y\ x )^ p 



dy , 



(36) 



and < p < 1 is a free parameter. For a particular input distribution p(x), the tightest bound is 
obtained by optimizing over p, which determines the random coding exponent 



EAR) = max E (p) - pR. 

0<p<l 



(37) 



For uniform input distribution, we define the coded modulation exponent Eq u (p) as the 
exponent of a decoder which uses metrics q(x,y) = p(y\x), namely 



E™{p) = -logE 



i ^ r P (Y\x')\~+p 



(38) 



Gallager's derivation can easily be extended to memoryless channels with generic codeword 
metrics decomposable as product of symbols metrics, that is q{x,y) = Yln=i Q( x n, Vn)-> Details 
can be found in [13]. The error probability is upper bounded by the expression 



P e <exp(-iV(^(p, s)-pR) > j, 



(39) 



where 



the generalized Gallager function Eq(p, s) is given by 

E q (p,s) = -\ogE 



p(x)- 



(40) 



q(X,YY 

The expectation is carried out according to the joint probability p(y\x)p(x). For a particular 
input distribution p(X), the random coding error exponent E%(R) is then given by [13] 

E*(R) = max maxE q (p, s) - pR. (41) 

0<p<l s>0 

For the specific case of BICM, assuming uniformly distributed inputs and a generic bit metric 
qj(b,y), we have that Gallager's generalized function EQ lcm (p, s) is given by 



£ bicm (p, S ) = -logE 



En 



qj (b 3 (x'),YY 



(42) 



For completeness, we note that the cutoff rate is given by R = E (l) and, analogously, we 
define the generalized cutoff rate as 



R q Q = E q r {R = 0) = max E q Jl, s). 

s>0 



(43) 
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B. Data processing inequality for error exponents 

In [13], it was proved that the data-processing inequality holds for error exponents, in the 
sense that for a given input distribution we have that E^(p, s) < E (p) for any s > 0. Next, we 
rederive this result by extending Gallager's reasoning in [15] to mismatched decoding. 

The generalized Gallager function E%(p, s) in Eq. ( |40| ) can be expressed as 

El{p,s) = -log (jY^p{x)p{y\x) fepOO (^r^) S ) ) *V- ^ 

As long as the metric does not depend on the transmitted symbol x, the function inside the 
logarithm can be rewritten as 

J (y2p(x)p(y\x)q(x,y)-'p\ (j2p(x')q(x',y)*\ dy. (45) 

For a fixed channel observation y, the integrand is reminiscent of the right-hand side of 
Holder's inequality (see Exercise 4.15 of [15]), which can be expressed as 

fe^Vl^fe 6/ Y. (46) 
Hence, with the identifications 

1 1 -sp 

di = p(x) 1+ pp(y\x) 1+ pq(x,y) 1+ p (47) 
b i =p(x) T +~pq(x,y)^~p, (48) 

we can lower bound Eq. ( |45| ) by the quantity 

J y^p{x)p{y\x)^rpdy\ . (49) 



Recovering the logarithm in Eq. ( |44| ), for a general mismatched decoder, arbitrary s > and 
any input distribution, we obtain that 

E (p)>E«(p,s). (50) 



Note that the expression in Eq. (49) is independent of s and of the specific decoding metric 



q(x,y). Nevertheless, evaluation of Gallager's generalized function for the specific choices s 
and q{x,y) oc p(y\x) attains the lower bound, which is also Eq. ( |38] ). 
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Equality holds in Holder's inequality if and only if for all i and some positive constant c, 

p i 

al +p = cbl +p (see Exercise 4.15 of [15]). In our context, simple algebraic manipulations show 
that the necessary condition for equality to hold is that 

p{y\x) = c'q(x,y) s ' for all x G X (51) 

for some constants d and s' . In other words, the metric q(x, y) must be proportional to a power 
of the channel transition probability p(y\x), for the bound ( [50] ) to be tight, and therefore, to 
achieve the coded modulation error exponent. 

C. Error exponent for BICM with the parallel-channel model 

In their analysis of multilevel coding and successive decoding, Wachsmann et al. provided 
the error exponents of BICM modelled as a set of parallel channels [11]. The corresponding 
Gallager's function, which we denote by E™ d (p), is given by 

3=1 Jy b j= o \b>=o Qj(bj,y) 1+ " J 

which corresponds to a binary-input channel with input bj, output y and bit metric matched to 
the transition probability Pj(y\bj). 

This equation can be rearranged into a form similar to the one given in previous sections. 
First, we insert the summation in the logarithm, 



p 



K d (p) = - log II / E P^)PMh) E Ptfj) ,?,! \ d v ) (53) 

\ j=1 Jy bj=0 \ b , =0 q j (p j ,y)^ P J J 

Then, we notice that the output variables y are dummy variables which possibly vary for each 
value of j. Let us denote the dummy variable in the j-th subchannel by y'j. We have then 

4 nd (p) = "log (fl I jZP^Mv'jh) (i^P^ f^^l ) W 

\ j=l Jy' ]bj =o \ b '.= Qj{bj,yj) 1+ " J J 

= -log ( f w S )p(i/|i) (j2p( x ') q{x ' ,y ' )1+ A d y) ■ (55) 

Here we carried out the multiplications, defined the vector y' to be the collection of the m 
channel outputs, and denoted by x = fi(bx, . . . , b m ) and x' = p(b[, . . . , b' m ) the symbols selected 
by the bit sequences. This equation is the Gallager function of a mismatched decoder for a 
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channel output y', such that for each of the m subchannels sees a statistically independent 
channel realization from the others. 

In general, since the original channel cannot be decomposed into parallel, conditionally 
independent subchannels, this parallel-channel model fails to capture the statistics of the channel. 

The cut-off rate with the parallel-channel model is given by 

K d = -f>g I [i^P^pMhA dy. (56) 
j=i J y \ bj =o J 

The cutoff rate was given in [8] as m times the cutoff rate of an averaged channel, 

log2-log( l + ^f> x ~' „• „ I • (5V) 



= m 



From Jensen's inequality one easily obtains that Rq v < Rq 



ind 



D. Generalized mutual information for BICM 

The largest achievable rate with mismatched decoding is not known in general. Perhaps the 
easiest candidate to deal with is the generalized mutual information (GMI) [12], [13], [14], given 
by 

A 



sup L 



s>0 



where 



'gmi 



's)=E 



log 



q{X,YY 



(58) 



(59) 



As in the case of matched decoding, this definition can be recovered from the error exponent, 



'gmi 



» 



dE$(p, s) 



dp 



p=0 



lim 



p 



(60) 



We next see that the generalized mutual information is equal to the BICM capacity of [8] 
when the metric ([7J) is used. Similarly to Section III, the result does not require the presence of 
an interleaver of infinite length. Further, the interleaver is actually not necessary for the random 
coding arguments. First, we have the following, 
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Theorem 2: The generalized mutual information of the BICM mismatched decoder is equal to 
the sum of the generalized mutual informations of the independent binary-input parallel channel 
model of BICM, 



l gmi 



sup 

s>0 



3=1 



q 3 {b v YY 



(61) 



The expectation is carried out according to the joint distribution Pj{bj)pj(y\bj), with Pj(bj 
Proof: For a given s, and uniform inputs, i.e., p(x) = ^r, Eq. ((59]) gives 



2' 



-fgmiO) = E 



log 



(62) 



We now have a closer look at the denominator in the logarithm of ( [62] ). The key observation 
here is that the sum over the constellation points of the product over the binary label positions 
can be expressed as the product over the label position is the sum of the probabilities of the bits 
being zero and one, i.e., 

1 m 1 m 



3=1 



3=1 



(63) 
(64) 



Rearranging terms in ( |62| ) we obtain 

E 



-^gmi (s) 



J? og =o,3o* +^ = 1,50*) 



m i ' i 



/ j O / / Om-l / j 



j=l 6=0 



lQ S 1 v-l 



qj(bj(x),Y) ! 



(65) 



(66) 



the expectation expectation being done according to the joint distribution pj(bj)pj(y\bj), with 
Pjibj) = \. U 

There are a number of interesting particular cases of the above theorem. 

Corollary 1: For the classical BICM decoder with metric in Eq. (Q, 



j f-bicm 

-'gmi ^-X 



(67) 



Proof: Since the metric qj(bj,y) is proportional to pj(y\bj), we can identify the quantity 



E 



logy— ^ 



(68) 
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as the generalized mutual information of a matched binary-input channel with transitions pj (y\bj 
Then, the supremum over s is achieved at s = 1 and we get the desired result. 
Corollary 2: For the max-log metric in Eq. ( fTTj ), 

(max^*? p{y\x)) s 



m 

Szczecinski a/, studied the mutual information with this decoder [16 



log r 



(69) 

, using this formula for 



s — 1. Clearly, the optimization over s may induce a larger achievable rate, as we see in the 
next section. More generally, as we shall see later, letting s = 1/(1 + p) in the mismatched error 
exponent can yield some degradation. 

E. BICM with mismatched decoding: numerical results 

The data-processing inequality for error exponents yields i?o lcm (p, s) < E^ m {p), where the 
quantity in the right-hand side is the coded modulation exponent. On the other hand, no gen- 
eral relationship holds between E™ d (p) and E^ m {p). As it will be illustrated in the following 
examples, there are cases for which E™(p) can be larger than E™ d (p), and viceversa. 

Figures [2| [3] and [4] show the error exponents for coded modulation (solid), BICM with 
independent parallel channels (dashed), BICM using mismatched metric (dash-dotted), and 
BICM using mismatched metric ( [TTj ) (dotted) for 16-QAM with Gray mapping, Rayleigh fading 
and snr = 5, 15, —25 dB, respectively. Dotted lines labeled with s = j^- correspond to the 
error exponent of BICM using mismatched metric ( fTT| ) letting s = The parallel-channel 
model gives a larger exponent than the coded modulation, in agreement with the cutoff rate 
results of [8]. In contrast, the mismatched-decoding analysis yields a lower exponent than coded 
modulation. As mentioned in the previous section, both BICM models yield the same capacity. 

In most cases, BICM with a max-log metric ( fTT| ) incurs in a marginal loss in the exponent 
for mid-to-large SNR. In this SNR range, the optimized exponent and that with s = are 
almost equal. For low SNR, the parallel-channel model and the mismatched-metric model with 
([7]) have the same exponent, while we observe a larger penalty when metrics ( fTT| ) are used. As 



we observe, some penalty is incurred at low SNR for not optimizing over s. We denote with 
crosses the corresponding achievable information rates. 

An interesting question is whether the error exponent of the parallel-channel model is always 
larger than that of the mismatched decoding model. The answer is negative, as illustrated in 
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Figure [5} which shows the error exponents for coded modulation (solid), BICM with independent 
parallel channels (dashed), BICM using mismatched metric ([7]) (dash-dotted), and BICM using 



mismatched metric ( fTT| ) (dotted) for 8-PSK with Gray mapping in the unfaded AWGN channel. 



V. Extrinsic Side Information 

Next to the classical decoder described in Section [TTJ, iterative decoders have also received 
much attention [3], [4], [5], [6], [7] due to their improved performance. Iterative decoders can 
also be modelled as mismatched decoders, where the bit decoding metric is now of the form 

qj (b,y) = PivW) l[ext r {b r (x')), (70) 

where we denote by extj(fr) the extrinsic information, i.e., the "a priori" probability that the j-th 
bit takes the value b. Extrinsic information is commonly generated by the decoder of the binary 
code C. Clearly, we have that extj(O) + extj(l) = 1, and < extj(0), extj(l) < 1. Without 
extrinsic information, we take extj(0) = extj(l) = |, and the metric is given by Eq. ((7]). 

In the analysis of iterative decoding, extrinsic information is often modeled as a set of random 
variables EXTj(O), where we have defined without loss of generality the variables with respect 
to the all-zero symbol. We denote the joint density function by p(exti(0), . . . , ext m (0)) = 
ilj=i P( ex ^'(0))- We discuss later how to map the actual extrinsic information generated in 
the decoding process onto this density. The mismatched decoding error exponent Eq(p,s) for 
metric ( p70| ) is given by Eq. (|42]), where the expectation is now carried out according to the 
joint density p(x)p(y\x)p(exti(0))- ■ ■ p(ext m (0)). Similarly, the generalized mutual information 



is again obtained as / gmi = max s linip^o 



p 



It is often assumed [5] that the decoding metric acquires knowledge on the symbol x effectively 
transmitted, in the sense that for any symbol x' G X, the j-th bit decoding metric is given by 

q j (b j (x') = b,y)= PivW) II exS>i>{W) © M^)), (71) 

where © denotes the binary addition. Observe that extrinsic information is defined relative to the 
transmitted symbol x, rather than relative to the all-zero symbol. If the j-th bit of the symbols x" 
and x coincide, the extrinsic information for bit zero extj(O) is selected, otherwise the extrinsic 
information extj(l) is used. 
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For the metric in Eq. fTT] ), the proof presented in Section IV-B of the data processing inequality 



fails because the integrand in Eq. (|45j) cannot be decomposed into a product of separate terms 
respectively depending on x and x', the reason being that the metric q(x', y) varies with x. 

On the other hand, since the symbol metric q(x',y) is the same for all symbols x', the 
decomposition of the generalized mutual information as a sum of generalized mutual informations 
across the m bit labels in Theorem [2] remains valid, and we have therefore 



(72) 



2 Ef/=o<?#i = b',Y)_ 

This expectation is carried out according to p(bj)pj(y\bj)p(exti(0))- ■ -p(ext m (0)), with p(bj) = 
|. Each of the summands can be interpreted as the mutual information achieved by non-uniform 
signalling in the constellation set X, where the probabilities according to which the symbols are 
drawn are a function of the extrinsic informations extj(-). The value of J gmi may exceed the 
channel capacity [5], so this quantity is a pseudo-generalized mutual information, with the same 
functional form but lacking operational meaning as an achievable rate by the decoder. 
Alternatively, the metric in Eq. ([70]) may depend on the hypothesized symbol x' , that is 

q j (b j (x') = b,y)= J2 P(V\ X ") II exV(6 3 -,(x") © b f (x')). (73) 



Differently from Eq. ( |7T| ), the bit metric varies with the hypothesized symbol x' and not with 
the transmitted symbol x. Therefore, Theorem [2] cannot be applied and the generalized mutual 
information cannot be expressed as a sum of mutual informations across the bit labels. On the 
other hand, the data processing inequality holds and, in particular, the error exponent and the 
generalized mutual information are upper bounded by that of coded modulation. Moreover, we 
have the following result. 

Theorem 3: In the presence of perfect extrinsic side information, the error exponent with 



metric (73) coincides with that of coded modulation. 



Proof: With perfect extrinsic side information, all the bits f ^ j are known, and then 

1 when x" = x', 

\l exV (b f {x") © bj, {%')) = I (74) 
j'^j I otherwise, 

which guarantees that only the symbol x" = x' is selected. Then, qj(bj(x') = b,y) = p(y\x') and 
the symbol metric becomes q(x',y) = p(y\x') m for all x' G X. As we showed in Eq. ( |5Tj ), this 
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is precisely the condition under which the error exponent (and the capacity) with mismatched 
decoding coincides that of coded modulation. ■ 
The above result suggests that with perfect extrinsic side information, the gap between the 
error exponent (and mutual information) of BICM and that of coded modulation can be closed 
if one could provide perfect side information to the decoder. A direct consequence of this result 



is that the generalized mutual information with BICM metric f70| ) and perfect extrinsic side 
information is equal to the mutual information of coded modulation. An indirect consequence 
of this result is that the multi-stage decoding [17], [11] does not attain the exponent of coded 
modulation, even though its corresponding achievable rate is the same. The reason is that the 
decoding metric is not of the form cp(y\x) s , for some constant c and s, except for the last bit in 
the decoding sequence. We hasten to remark that the above rate in presence of perfect extrinsic 
side information need not be achievable, in the sense that there may not exist a mechanism 
for accurately feeding the quantities extj(b) to the demapper. Moreover, the actual link to the 
iterative decoding process is open for future research. 

VI. Conclusions 

We have presented a mismatched-decoding analysis of BICM, which is valid for arbitrary 
finite-length interleavers. We have proved that the corresponding generalized mutual information 
coincides with the BICM capacity originally given by Caire et al. modeling BICM as a set 
of independent parallel channels. More generally, we have seen that the error exponent cannot 
be larger than that of coded modulation, contrary to the analysis of BICM as a set of parallel 
channels. For Gaussian channels with binary reflected Gray mapping, the gap between the BICM 
and CM error exponents is small, as found by Caire et al. for the capacity. We have also seen 
that the mutual information appearing in the analysis of iterative decoding of BICM via EXIT 
charts admits a representation as a form of generalized mutual information. However, since this 
quantity may exceed the capacity, its operational meaning as an achievable rate is unclear. We 
have modified the extrinsic side information available to the decoder, to make it dependent on the 
hypothesized symbol rather than on the transmitted one, and shown that the corresponding error 
exponent is always lower bounded by that of coded modulation. In presence of perfect extrisinc 
side information, both error exponents coincide. The actual link to the iterative decoding process 
is open for future research. 
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Fig. 1. Parallel channel model of BICM. 
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Fig. 2. Error exponents for coded modulation (solid), BICM with independent parallel channels (dashed), BICM using 
mismatched metric l|7} (dash-dotted), and BICM using mismatched metric j I 1 ft (dotted) for 16-QAM with Gray mapping, 
Rayleigh fading and snr = 5 dB. 
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Fig. 3. Error exponents for coded modulation (solid), BICM with independent parallel channels (dashed), BICM using 
mismatched metric l|7j (dash-dotted), and BICM using mismatched metric {TT} (dotted) for 16-QAM with Gray mapping, 
Rayleigh fading and snr = 15 dB. 
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Fig. 4. Error exponents for coded modulation (solid), BICM with independent parallel channels (dashed), BICM using 
mismatched metric (|7} (dash-dotted), and BICM using mismatched metric \\\\ (dotted) for 16-QAM with Gray mapping, 
Rayleigh fading and snr = —25 dB. Crosses correspond to (from right to left) coded modulation, BICM with metric |7), BICM 
with metric \ \\ \ and BICM with metric \ \\ \ with a = 1. 
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Fig. 5. Error exponents for coded modulation (solid), BICM with independent parallel channels (dashed), BICM using 
mismatched metric |7]l (dash-dotted), and BICM using mismatched metric l | 1 I | i (dotted) for 8-PSK with Gray mapping, AWGN 
and snr = 5 dB. 
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